Use of generalised additive models to categorise continuous variables in clinical prediction

نویسندگان

  • Irantzu Barrio
  • Inmaculada Arostegui
  • José M Quintana
  • IRYSS-COPD Group
چکیده

BACKGROUND In medical practice many, essentially continuous, clinical parameters tend to be categorised by physicians for ease of decision-making. Indeed, categorisation is a common practice both in medical research and in the development of clinical prediction rules, particularly where the ensuing models are to be applied in daily clinical practice to support clinicians in the decision-making process. Since the number of categories into which a continuous predictor must be categorised depends partly on the relationship between the predictor and the outcome, the need for more than two categories must be borne in mind. METHODS We propose a categorisation methodology for clinical-prediction models, using Generalised Additive Models (GAMs) with P-spline smoothers to determine the relationship between the continuous predictor and the outcome. The proposed method consists of creating at least one average-risk category along with high- and low-risk categories based on the GAM smooth function. We applied this methodology to a prospective cohort of patients with exacerbated chronic obstructive pulmonary disease. The predictors selected were respiratory rate and partial pressure of carbon dioxide in the blood (PCO2), and the response variable was poor evolution. An additive logistic regression model was used to show the relationship between the covariates and the dichotomous response variable. The proposed categorisation was compared to the continuous predictor as the best option, using the AIC and AUC evaluation parameters. The sample was divided into a derivation (60%) and validation (40%) samples. The first was used to obtain the cut points while the second was used to validate the proposed methodology. RESULTS The three-category proposal for the respiratory rate was ≤ 20;(20,24];> 24, for which the following values were obtained: AIC=314.5 and AUC=0.638. The respective values for the continuous predictor were AIC=317.1 and AUC=0.634, with no statistically significant differences being found between the two AUCs (p =0.079). The four-category proposal for PCO2 was ≤ 43;(43,52];(52,65];> 65, for which the following values were obtained: AIC=258.1 and AUC=0.81. No statistically significant differences were found between the AUC of the four-category option and that of the continuous predictor, which yielded an AIC of 250.3 and an AUC of 0.825 (p =0.115). CONCLUSIONS Our proposed method provides clinicians with the number and location of cut points for categorising variables, and performs as successfully as the original continuous predictor when it comes to developing clinical prediction rules.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modelling in landscape ecology – regionalisation by means of habitat modelling

Figure 1: Principle of habitat modelling: sampling of species distribution data (here: presence-absence data) and selected predictor variables; estimating an empirical, predictive model (in this case: by logistic regression a generalised linear model for binary response variables, other, more flexible approaches are generalised additive models, classification and regression trees, artificial ne...

متن کامل

Predictive Ability of Statistical Genomic Prediction Methods When Underlying Genetic Architecture of Trait Is Purely Additive

A simulation study was conducted to address the issue of how purely additive (simple) genetic architecture might impact on the efficacy of parametric and non-parametric genomic prediction methods. For this purpose, we simulated a trait with narrow sense heritability h2= 0.3, with only additive genetic effects for 300 loci in order to compare the predictive ability of 14 more practically used ge...

متن کامل

Prediction of potential habitat distribution of Artemisia sieberi Besser using data-driven methods in Poshtkouh rangelands of Yazd province

The present study aimed to model potential habitat distribution of A. sieberi, and its ecological requirements using generalized additive model (GAM) and classification and regression tree (CART) in in the Poshtkouh rangelands of Yazd province. For this purpose, pure habitats of the species was delineated and the species presence data was recorded by the systematic-randomize sampling method. Us...

متن کامل

Estimating structural mean models with multiple instrumental variables using the generalised method of moments

Instrumental variables analysis using genetic markers as instruments is now a widely used technique in epidemiology and biostatistics. As single markers tend to explain only a small proportion of phenotypical variation, there is increasing interest in using multiple genetic markers to obtain more precise estimates of causal parameters. Structural mean models (SMMs) are semi-parametric models th...

متن کامل

Exploring the Use of Random Regression Models withLegendre Polynomials to Analyze Clutch Sizein Iranian Native Fowl

Random regression models (RRM) have become common for the analysis of longitudinal data or repeated records on individual over time. The goal of this paper was to explore the use of random regression models with orthogonal / Legendre polynomials (RRL) to analyze new repeated measures called clutch size (CS) as a meristic trait for Iranian native fowl. Legendre polynomial functions of increasing...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 13  شماره 

صفحات  -

تاریخ انتشار 2013